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INDEXING SYSTEM AND METHOD FOR NEAREST NEIGHBOR SEARCHES 
IN HIGH DIMENSIONAL DATA SPACES 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to methods and systems for indexing objects in high 
dimensional data spaces to respond to user queries. 

2. Description of the Related Art 

Nearest neighbor searching on high dimensional data spaces is essentially a method of 
searching for objects in a data space that are similar to a user-selected object, with the user- 
selected object defining a query. For example, using the present assignee's QBIC system, a user 
can select a digital image and use the image as a query to a data base for images that are similar 
to the user-selected digital image. In response to the query, the "k" closest images are returned, 
where "k" is an integer defined by the user or search engine designer. These "k" images are 
referred to as the "k" nearest neighbors to the image that was used as the query, and for indexing 
and search purposes they are typically considered to be multidimensional data points "p M that are 
close to a multidimensional data point "q" representing the query. Other non-limiting examples 
of applications that use nearest neighbor searching include video databases, data mining, pattern 
classification, and machine learning. 
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In any case, multidimensional indexing methods ("MIMs") have been introduced for 
indexing multidimensional objects by partitioning the data space, clustering data according to the 
partitioning, and using the partitions to prune the search space to promote fast query execution. 
It will readily be appreciated that in the context of large databases that hold a high number of 
objects, the time to execute a query like the one discussed above would be excessive in the 
absence of MIMs. As recognized by the present invention, while effective for low 
dimensionalities, MIMs are not effective and indeed tend toward being counterproductive for 
objects having high dimensionalities, e.g., of ten, twenty or more. Image objects, for example, 
can have hundreds of dimensions, and text documents can have thousands of dimensions. 

Weber et al. disclose a filtering method intended to be an improvement over conventional 
MIMs in "A Quantitative Analysis and Performance Study for Similarity-Search Methods in 
High-Dimensional Spaces", Proc. of the 24th IntT Conf. on VLDB . 1998 ("VA file" method). 
In the VA file method, compact approximations of data objects (also referred to as "vectors") are 
generated, and by first scanning the compact approximations, a large number of the larger actual 
vectors can be filtered out such that only a small number of vectors need be examined. In this 
way, query execution time is minimized. 

The present invention has recognized, however, that the VA file method has at least two 
drawbacks. The first is that as the dimensionality of the data objects increases, the number of 
bits used in the approximations also increases significantly to facilitate adequate filtering. This 
means that the performance of the VA file method, like the performance of the above-mentioned 
MIMs, degrades significantly when applied to high dimensional data spaces (e.g., dimensions over 
100). The second drawback with the VA file method is that its filtering capability decreases in 
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the case of clustered data such as multimedia data. The present invention, having recognized the 
above-noted deficiencies in the prior art, has provided the improvements disclosed below. 

SUMMARY OF THE INVENTION 

The invention is a general purpose computer programmed according to the inventive steps 
herein. The invention can also be embodied as an article of manufacture - a machine component 
- that is used by a digital processing apparatus and which tangibly embodies a program of 
instructions that are executable by the digital processing apparatus to undertake the present 
invention. This invention is realized in a critical machine component that causes a digital 
processing apparatus to perform the inventive method steps herein. The invention is also a 
computer-implemented method for undertaking the acts disclosed below. 

Accordingly, a computer is programmed to undertake method acts for querying for data 
using a query. The method acts undertaken by the computer include, for at least some data 
vectors in a data space, generating respective approximations in polar coordinates. Also, the 
method acts executed by the computer include returning "k" nearest neighbors to the query based 
on the approximations. 

In a preferred embodiment, the method acts executed by the computer further include 
dividing the data space into plural cells, and approximating at least one data point in at least one 
cell by using polar coordinates with respect to the at least one cell. Accordingly, the method is 
referred to as "local polar coordinate-based approximation". In a particularly preferred 
embodiment, the data space has M d" dimensions, and a number of "b" bits to be assigned to each 
cell is determined. Then, the data space is divided into 2 bd cells. 
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As disclosed in greater detail below, each approximation defines a lower bound d min and 
an upper bound d max , and the method acts executed by the computer include generating a 
candidate set of approximations based on the lower bound d min and upper bound d^ of the 
approximations. Moreover, the query can be represented by a query vector q, and the computer 
adds a first approximation having a first lower bound d minl to the candidate set if d minl < k-NN dist 
(q), wherein k-NN dist (q) is the k* largest distance between the query vector q and nearest 
neighbor vectors p encountered so far. The candidate set is then used to return "k" nearest 
neighbor vectors p to the query vector q. With this invention, not all vectors p corresponding 
to approximations in the candidate set are examined to return the "k" nearest neighbors. 

In another aspect, a computer program product includes a program of instructions that 
have computer readable code means for generating local polar coordinate-based approximations 
of at least some data vectors p in at least one data set having a dimensionality of "d". The local 
polar coordinates are independent of "d". Computer readable code means use the approximations 
to return "k" nearest neighbors to a query. 

In yet another aspect, a computer-implemented method is disclosed for finding, in a data 
space, "k" closest data vectors p to a query vector q. The method includes rendering 
approximations of at least some of the data vectors p using local polar coordinates, and filtering 
the approximations. After filtering, the "k" closest data vectors p are returned. 

The details of the present invention, both as to its structure and operation, can best be 
understood in reference to the accompanying drawings, in which like reference numerals refer 
to like parts, and in which: 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of the present system; 
Figure 2 is a flow chart of the logic for generating the LPC file; 
Figure 3 is a graph schematically showing the data space cells with polar coordinates in 
5 two dimensions; 

Figure 4 is a graph schematically showing a single data space cell in three dimensions; 
Figure 5 is a flow chart showing the logic of generating the candidate set; 
Figure 6 is a graph schematically showing a data space cell with polar coordinates and 
minimum and maximum distances in two dimensions; 

Figure 7 is a graph schematically showing a data space cell with polar coordinates and 
y ; minimum and maximum distances in three dimensions; and 

; J Figure 8 is a flow chart showing the logic of finding the "k" nearest neighbors to a query 

ii| "q" using the candidate set. 

!|5 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

^ Referring initially to Figure 1, a system is shown, generally designated 10, that includes 

a data space server 12 having access to a local or remote software-implemented index module 
14. Using an input device 16, a user of a user computer 18 can input a query for data from a 
database 20, and the server 12, by means of the index module 14, accesses the database 20 and 
20 returns the requested data to the user computer 18 for display or storage thereof on an output 

device, such as a monitor 22. More specifically, as set forth further below, the user computer 
18 sends a query for data essentially using a query vector q, with the index module 14 returning 
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the "k" nearest neighbors (referred to herein as the "k" data vectors p that are closest to q) in 
response. The query vector q can be, e.g., an example image for which the user wants close 
matches. Other applications of k-nearest neighbor searching are contemplated herein, such as but 
not limited to document retrieval, data mining, pattern classification, and machine learning. 

As intended herein, either or both of the server 12/user computer 18 can be a server 
computer made by International Business Machines Corporation (IBM) of Armonk, N.Y. Other 
digital processors, however, may be used, such as personal computers, laptop computers, 
mainframe computers, palmtop computers, personal assistants, or any other suitable processing 
apparatus can be used. The input device 16 can be established by one or more of: a computer 
mouse, keyboards, keypads, trackballs, and voice recognition devices. Output devices other than 
the monitor 22 can be used, such as printers, other computers or data storage devices, and 
computer networks. 

In any case, the processor of the server 12 accesses the module 14 to undertake the logic 
of the present invention, which may be executed by a processor as a series of computer- 
executable instructions. The instructions may be contained on a data storage device with a 
computer readable medium, such as a computer diskette having a computer usable medium with 
a program of instructions stored thereon. Or, the instructions may be stored on random access 
memory (RAM) of the computer, on a DASD array, or on magnetic tape, conventional hard disk 
drive, electronic read-only memory, optical storage device, or other appropriate data storage 
device. In an illustrative embodiment of the invention, the computer-executable instructions may 
be lines of C or C++ or Java code. 
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Indeed, the flow charts herein illustrate the structure of the logic of the present invention 
as embodied in computer program software. Those skilled in the art will appreciate that the flow 
charts illustrate the structures of computer program code elements including logic circuits on an 
integrated circuit, that function according to this invention. Manifestly, the invention is practiced 
in its essential embodiment by a machine component that renders the program code elements in 
a form that instructs a digital processing apparatus (that is, a computer) to perform a sequence 
of function steps corresponding to those shown. 

The logic of the present starts in Figure 2 at blocks 24 and 25, wherein the data space in 
the database 20 is divided into 2 bd cells, wherein "b M is an integer number of data points that are 
assigned to each cell and "d" is the dimensionality of the database 20. For illustration purposes, 
Figure 3 shows a two-dimensional data space that has been divided into plural cells 26, while 
Figure 4 illustrates a single three dimensional cell 28. The use of two and three dimensions in 
Figures 2 and 3 is for simplicity of disclosure only, it being understood that the principles set 
forth herein apply to any high dimensional data spaces. 

Moving to block 30, a DO loop is entered such that for each data point, an approximation 
in local polar coordinates is generated at block 32. As shown in Figure 3, each cell 26 has a 
local origin "O" at its bottom left corner, and each cell can be represented by its coordinates 
shown in Figure 3. 

A vector p f is generated in local polar coordinates having a radius V from the cell's local 
origin "O" to the i th data point and an angle 0 between the vector and the bisecting diagonal 34 
of the cell. This is illustrated in Figures 3 and 4. As a result, each vector p is represented by 
an approximation a = <cell, radius V, angle "0">. At block 36, a complete local polar 
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coordinate (LPC) file is generated, which in the preferred embodiment can be an array of 
approximations a representing vectors p. 

As can be appreciated in reference to Figures 3 and 4, an approximation is a set of points 
having radius V and angle 6 within a cell. In the two dimensional illustration shown in Figure 
3, the approximation represents two points p and p' which have polar coordinates (r, 6) and 
which are symmetric with respect to the diagonal 34. This is in contrast to the above-mentioned 
VA method, wherein the approximation would represent the entire cell. In the three dimensional 
illustration of Figure 4, the approximation is represented by a circle 38 around the diagonal of 
the cell, whereas the above-mentioned VA file method would produce an approximation that 
would consist of the entire cube. Thus, it will readily be appreciated that the present method 
produces more efficient approximations than does the VA file method. In higher dimensions, an 
approximation in the present invention is a set of points on a hypersphere. 

Figures 5-7 illustrate how the preferred LPC flat file generated at block 36 in Figure 2 
is used upon receipt of a query, represented by a query vector q. Commencing at block 40, a 
minimum distance d min and a maximum distance d max are computed for each approximation. 
These distances represent the minimum and maximum bounds, respectively, between the 
respective data vector p and query vector q. The minimum distance d min is equal to [ | p | 2 + 
I <1 I 2 - 2 I P I I q I cos (0i " 8i)f\ wherein the angle B l is the angle between the cell diagonal 
and the data vector p and the angle 6 2 is the angle between the cell diagonal and the query vector 
q, as shown in Figure 6 for the two dimensional case and Figure 7 for the three dimensional case. 
On the other hand, the maximum distance d max is equal to[|p| 2 +|q| 2 -2|p| |q| cos^ 
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Without loss of generality, these properties hold for any number of dimensions. In the 
three dimensional case shown in Figure 7, the point "A" (representing d^), C (representing d^J, 
the origin "O", the point "B" (the endpoint of the query vector q), and the point "D" (the corner 
of the cell opposite to the origin "O") all lie in the same plane. 

With the above understanding in mind, the logic moves from block 40 to decision 
diamond 42, wherein for the candidate near neighbor under test it is determined whether the 
corresponding minimum distance d^ is greater than the k th -largest distance k-NN dist (q) between 
the query vector q and the data vectors p in an initially null answer set. The distance k-NN dist (q) 
is initialized at an appropriate large value. This test can be thought of as a coarse test, which, 
if positive, leads to the immediate elimination of the candidate at block 44. The next candidate 
is retrieved at block 46, and the logic loops back to decision diamond 42 to test the next 
candidate. 

On the other hand, if the candidate passes the test at decision diamond 42, indicating that 
the candidate might be a k-nearest neighbor, the candidate is added to a candidate set at block 
48. Then, the candidate's maximum distance is compared to the k th -largest distance k-NN dist (q) 
at decision diamond 50, and if the candidate's maximum distance is equal to or greater than the 
k th -largest distance k-NN^q), the logic loops back to block 46 to retrieve the next candidate for 
test. 

In contrast, if the candidate's maximum distance is less than the k th -largest distance k- 
NN dlst (q), indicating that the candidate is probably one of the "k" near neighbors being sought, 
the data vector p that corresponds to the candidate is added to an answer set "lain" at block 52. 
The answer set M knn" can be ordered by distance between the query vector q and each data vector 
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p in the set. Then, at block 54 the k th -largest distance k-NN dist (q) is potentially recomputed by 
setting it equal to the k^-largest distance k-NN dist (q) corresponding to the k* vector p in the 
answer set. 

After the logic of Figure 5, it will be appreciated that all candidate approximations have 
been tested, and most have been eliminated. In other words, most data points have been filtered 
out. Those that remain have been added to the candidate set, and the "best" of the candidates in 
the candidate set (as indicated by having a relatively small d max ) have been added to the answer 
set. The next stage of the logic is then commenced at block 56, wherein the k^-largest distance 
k-NN dlst (q) is set equal to the maximum distance of the p vectors in the answer set "lain". Then, 
a DO loop is entered at block 58 wherein the actual data points represented by the candidates in 
the candidate set are scanned in increasing order of distance. The next candidate is retrieved at 
block 60, and at decision diamond 62 the distance between the data vector p under test and the 
query vector q is compared to the ^-largest distance k-NN dist (q). If it is not less than k- 
NN dlst (q), the logic loops back to block 60 to retrieve the next candidate in the candidate set. On 
the other hand, if the candidate passes the test at decision diamond 62, it is inserted into the 
answer set "knn" at block 64, and the ^-largest distance k-NN dist (q) is recomputed at block 66. 
The next candidate is then retrieved at block 60 for test. Owing to the ordering by distance in 
the candidate set and answer set, the logic can end when the lower bound d^ of a candidate is 
encountered which exceeds the k-th distance k-NN dist (q), such that not all candidates in the 
candidate set need be tested. 

A pseudocode representation (with comments) of the logic of Figures 5 and 8 is as 
follows: 
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Algorithm £_NN_Search (q: vector, k: integer) 

{ 

// Variables used in the algorithm 

// k-NN* tst (q): the £-th largest distance between the query vector q and the vectors p encountered 
so far 

// N: the number of vectors in the database 

// knn: answer list to maintain the nearest k vectors encountered so far and their distances to the 
query vector q 

II candjist: min heap to maintain the candidate set 

// c: a candidate to insert into the candjist 

II c.oid: identifier assigned to uniquely identify the candidate c. 

II nn: a near neighbor to insert into the knn 

II MAX: a value that exceeds the possible largest distance between any two points within the 
database. 

// Stage 1 

// The primary purpose of this stage is to build the candjist for stage 2, For this purpose, we 
use k-N^Xq) 

II whose initial value is the possible largest distance between any two points within the database. 
The value of 

// k-NN* lst (q) is updated dynamically as new candidates are inserted to the candjist 

for z:=0 to k do { 

knn[i\.dist :- MAX; 

} 

k-N^Xq) := MAX; 

For every approximation a in the approximation set { 

Compute the lower and upper bounds a.d mm and a.d max of a. 
\f(a.d mm <k-NN di *Xq)){ 

Insert [c := {a.oid, a.d mm a,d max }] to the candidate set candjist; 
if (ad max < k-N^Xq)) { 

II The following is an ordered insertion in the knn array, i.e., the new element is inserted 

// into the correct position with respect to the distance in knn. 

Insert the near neighbor [nn := {oid = c.oid, dist = c.d max }] to the answer set knn; 

II Update k-NN dls Xq) after each insertion, if it gets smaller. 

k-NI^'Xq) := the distance of the A-th nearest neighbor in the answer set knn; 

} 

} 

} 

// Stage 2 

// Scan the candjist in increasing order of d mm to find the k nearest neighbors to the query point q. 
II The scanning procedure (while procedure in the code) ends when the lower bound (c.d mm ) of the 
// candidate c is encountered which exceeds the fc-th distance k-NN* is Xq) in the answer set. 

for /:=0 to k do { 

knn[i\.dist := MAX; 
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} 

while (get the candidate c from the candidate set candjist and c.d mm < k-NN^'Xq)) do { 
Read vector p corresponding to the c.oid; 
\f(L 2 (p, q) < k-NN* st (q)) { 

Insert the near neighbor [nn := {oid = c.oid, dist = L 2 {p, q)}] to the answer set knn. 

k-N^Xq) := the distance of the A-th nearest neighbor in the answer set knn; 

} 

} 

} 

While the particular INDEXING SYSTEM AND METHOD FOR NEAREST NEIGHBOR 
SEARCHES IN HIGH DIMENSIONAL DATA SPACES as herein shown and described in detail 
is fully capable of attaining the above-described objects of the invention, it is to be understood 
that it is the presently preferred embodiment of the present invention and is thus representative 
of the subject matter which is broadly contemplated by the present invention, that the scope of 
the present invention fully encompasses other embodiments which may become obvious to those 
skilled in the art, and that the scope of the present invention is accordingly to be limited by 
nothing other than the appended claims, in which reference to an element in the singular is not 
intended to mean "one and only one" unless explicitly so stated, but rather "one or more". All 
structural and functional equivalents to the elements of the above-described preferred embodiment 
that are known or later come to be known to those of ordinary skill in the art are expressly 
incorporated herein by reference and are intended to be encompassed by the present claims. 
Moreover, it is not necessary for a device or method to address each and every problem sought 
to be solved by the present invention, for it to be encompassed by the present claims. 
Furthermore, no element, component, or method step in the present disclosure is intended to be 
dedicated to the public regardless of whether the element, component, or method step is explicitly 
recited in the claims. No claim element herein is to be construed under the provisions of 35 
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U.S.C. §1 12, sixth paragraph, unless the element is expressly recited using the phrase "means for" 
or, in the case of a method claim, the element is recited as a "step" instead of an "act". 
WE CLAIM: 
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CLAIMS 



1 LA computer programmed to undertake method acts for querying for data using a 

2 query, the method acts undertaken by the computer including: 

3 for at least some data vectors in a data space, generating respective approximations 

4 in polar coordinates; and 

5 based on the approximations, returning "k" nearest neighbors to the query. 

1 2. The computer of Claim 1, wherein the method acts further comprise: 

2 dividing the data space into plural cells; and 

3 representing at least one data point in at least one cell in polar coordinates with 

4 respect to the at least one cell. 

1 3. The computer of Claim 2, wherein the data space has "d" dimensions and the 

2 method acts further comprise: 

3 determining a number of "b M bits to be assigned to each cell; and 

4 dividing the data space into 2 bd cells. 

1 4. The computer of Claim 1 , wherein each approximation defines a lower bound d^, 

2 and the method acts further comprise: 

3 generating a candidate set of approximations based at least on the lower bounds 

4 d min of the approximations. 
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1 5. The computer of Claim 4, wherein the query can be represented by a query vector 

2 q, and the method acts further comprise: 

3 adding a first approximation having a first lower bound d minl to the candidate set 

4 if d minl < k-NN dist (q), wherein k-NN^ (q) is the k th largest distance between the query 

5 vector q and nearest neighbor vectors p. 

1 6. The computer of Claim 5, wherein the method acts further comprise using the 

2 candidate set to return "k" nearest neighbors vectors p to the query vector q. 

1 7, The computer of Claim 6, wherein not all vectors p corresponding to 

2 approximations in the candidate set are examined to return the "k" nearest neighbors. 

1 8. A computer program product including a program of instructions having: 

2 computer readable code means for generating approximations including local polar 

3 coordinates of at least some data vectors p in at least one data set having a dimensionality 

4 of "d", the local polar coordinates being independent of "d"; and 

5 computer readable code means for using the approximations to return "k" nearest 

6 neighbors to a query. 
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2 



9. The computer program product of Claim 8, wherein the means for generating 
generates respective approximations of data vectors p in local polar coordinates. 



1 10. The computer program product of Claim 9, further comprising: 

2 computer readable code means for dividing the data space into plural cells; and 

3 computer readable code means for representing each approximation in polar 

4 coordinates with respect to one of the cells. 

1 11. The computer program product of Claim 10, wherein the data space has M d" 

2 dimensions, further comprising: 

3 computer readable code means for determining a number of "b" bits to be assigned 

4 to each cell; and 

5 computer readable code means for dividing the data space into 2 bd cells. 

1 12. The computer program product of Claim 9, wherein each approximation defines 

2 a lower bound d min and an upper bound d max , and the product further comprises: 

3 computer readable code means for generating a candidate set of approximations 

4 based at least on the lower bounds d min and upper bounds d max of the approximations. 

1 13. The computer program product of Claim 12, further comprising: 

2 computer readable code means for adding a first approximation having a first 

3 lower bound d minl to the candidate set if d minl < k-NN dist (q), wherein k-NN dist (q) is the 
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4 k th largest distance between the query vector q and nearest neighbor vectors p associated 

5 with approximations in the candidate set. 

1 14. The computer program product of Claim 1 3 , further comprising computer readable 

2 code means for using the candidate set to return "k" nearest neighbors vectors p to the query 

3 vector q. 

1 15. A computer-implemented method for finding, in a data space, "k" closest data 

2 vectors p to a query vector q, comprising: 

3 rendering approximations of at least some of the data vectors p using local polar 

4 coordinates; 

5 filtering the approximations; and 

6 after filtering, returning the "k" closest data vectors p. 

1 16. The method of Claim 15, further comprising: 

2 dividing the data space into plural cells; and 

3 representing each approximation in polar coordinates with respect to one of the 

4 cells. 

1 17. The method of Claim 16, wherein the data space has "d" dimensions and the 

2 method further comprises: 

3 determining a number of "b" bits to be assigned to each cell; and 
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4 dividing the data space into 2 bd cells. 

1 1 8 . The method of Claim 1 5, wherein each approximation defines a lower bound d min , 

2 and the method further comprises: 

3 generating a candidate set of approximations based at least on the lower bounds 

4 d min of the approximations. 

1 19. The method of Claim 18, further comprising: 

2 adding a first approximation having a first lower bound d^ to the candidate set 

3 if d^, < k-NN dist (q), wherein k-NN dist (q) is the k th largest distance between the query 

4 vector q and nearest neighbor vectors p associated with approximations in the candidate 

5 set. 

1 20. The method of Claim 19, further comprising using the candidate set to return "k" 

2 nearest neighbors vectors p to the query vector q. 

1 21. The method of Claim 20, wherein not all data vectors p corresponding to 

2 approximations in the candidate set are examined to return the "k" nearest neighbors vectors p. 

1 22. The computer of Claim 4, wherein each approximation defines an upper bound 

2 d max , and the method acts further comprise: 
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3 generating a candidate set of approximations based at least on the upper bounds 

4 dmax °f the approximations. 

1 23 . The computer program product of Claim 1 2, wherein each approximation defines 

2 an upper bound d^, and the product further comprises: 

3 computer readable code means for generating a candidate set of approximations 

4 based at least on the upper bounds d max of the approximations. 

1 24. The computer of Claim 1, wherein each approximation defines an upper bound 

2 d max , and the method acts further comprise: 

3 generating a candidate set of approximations based at least on the upper bounds 

4 d max of the approximations. 



IBM Case No. AM9-99-0217 



- 19 - 



INDEXING SYSTEM AND METHOD FOR NEAREST NEIGHBOR SEARCHES 
IN HIGH DIMENSIONAL DATA SPACES 

ABSTRACT 

Vectors representing objects in n-dimensional space are approximated by local polar 
coordinates on partitioned cells of the data space in response to a query, e.g., a query data vector 
entered with a request to find "k" nearest neighbors to the query vector. A set of candidate near 
neighbors is generated using the approximations, with the local polar coordinates being 
independent of the dimensionality of the data space. Then, an answer set of near neighbors is 
returned in response to the query. Thus, the present invention acts as a filter to reduce the 
number of actual data vectors in the data set that must be considered in responding to the query. 
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DECLARATION AND POWER OF ATTORNEY FOR PATENT APPLICATION 

As a befaw named inventor, I he/eby declare that: 

My residence, pBst offiCB address and crtaenship ere as stated hafow next to my twine. 

I bellow i em the original, first and sole Inventor (If only one name Is listed hetow} or an original, first and joint inventor fif plural names are listed below) of ihe subject Riauer 
which ij claimed and for which a patent is sought on the inveniion entitled 

INDEXING SYSTEM AND METHOD FOR NEAREST NEIGHBOR SEARCHES IN HIGH DIMENSIONAL DATA SPACES 

the specification t>f which is attechsd hereto unto* the following box is checked: 
w»« fled on 

ax United States Application NunAer or PCT IntemetiooafApptfcfttionNuirijar 

and was amended on _ (if applicable). 

] hereby state that I have reviewed and understand tho contents of the above identified specification, including tn$ claims. 93 amended by any amendment rerferred to above. 

I acknowledge the duty to disdow information which is materia* to patentability as defined in 37 CfR ^ T .56. 

t hereby claim foreign priority benefits under 35 USC M1$(a*d) or §36filb) ot any foreign application^} for patent or inventor's certificate, or *365|a) of any PCT 
International application which designated ot feast one country other than the United States, Hated below and hove also identified below, oy checking the bo*, any foreign 
application for oa ten 1 or inventor's certificate, or PCT tntwnaiionttlappltcetion having a fifing date before that of the application on which parity is claimed, 

Prior Foreign Application^): 



U (Nuntart ICountry) (QayfMoMhfYear Filed} 

; I hereby claim the benefit ondar 35 USC 5 1 t9(ei of any Unhed States provisional epplication&f listed betow; 

III Provisional Application^]: 

111 (Application ftofflbei) {Filing Daie) 

I hereby claim the benefit oimer 35 USC 5120 of any United States application^ or 63SS(c) of any PCT International application designating the United Stales, Usted below 
Q and, insofar as the subject maUer of each of the claims of this application is not disclosed in the prior United States or PCT International application in tic manner provided 
by the first pereyajihof 35 USC $112, 1 acknowtedgetha duty to disclose Information which is material to patentability as defined in 37 CFR $t,S8 which became available 
between the filing date of the prior application end the national or PCT International filing date of this application, 

(Application Number! (Filing Date) (Status - patented, pending, abandoned) 

Power of Attorney: 

I hereby appoint the foSowmg attorney^! andlor sgemlsi to prosecute this application end (0 transact ell business in the Potent and Trademark Office connected therewith. 



Priority Not Claimed 



Thomas R. Berthold (#28,689) 

Richard M. Ludwin (#33,010) 

Marc 0. McSwain (#44,929) 

Khanh Q. Tran (#41.352) 

John L Rogitz (#33,549) 

Alison D. Mortinger (#39,306) 
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Docket No. AM9-99-0217 
DECLARATION AMD POWER OF ATTORNEY FOB PATENT APPLICATION 



Address all telephone calls to; 

John L Hosted 
(619) 338-8075 



Address al! correspondence to: 

John L Hogitz 
Rogitz & Associates 
750 B Street, Suh8 3120 
San Diego, California 92101 



! hereby dodats that el] statements made herein ot my own knowledge sre \m and that ail statements made ort Information and belief are believed to be true: and further ihet 
thws statements wsra made with the kwwtedgethnt willful fslsd stdtermrits end thft like so mada me ptmiihabteby or imprisonment, or both, under Section 1001 of Title 
18 ot the United Slates Cede and lh.at sucii Witlluf fates stsumenti may jeopartfae th« validity of ilia application or any patent issued thsreon. 



full name of sol* or first titvwion GUANG-HO CHA 



inventor's signoturs; 




R«wiwc«: 344-1 Jangjun-Dong, Gumjung-Gu, Pusan, 609-393, Republic of Korea 



0s,K Ol /^OlAOOp 



cittonsKp: Republic of Korea (South Korea) 

Full name of socond Jtivtntor: CHIN-WAN CHUNG 



Post Office Address; Same 



inventor's apnature: 



Date: 



Resides: Korean Advanced Institute of Technology, Dept. of Computer Science, 373-1 Kusong-Dong, Yusong-Gu 
Taejon 305 701 



citizenship: United States 



Full nttma of third invaotor; 



ORAGUTIN PETKOViC 





Post Office Address; Same 



Inventor"* signature. 



Residence; 13591 Old Tree Way, Saratoga, California 95070 



Ctoenabip: United StStfiS 



Post Office Address: Same 
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DECLARATION AND POWER OF ATTORNEY FOR PATENT APPLICATION 

MuMrftaikfeMte: XIAOMING ZHU 



InvoMor's signature: 




Re&idsnce: 1407 Hackman Way. San Jose. California 95129 



Citizenship: Peoples Republic Of China Pwt <Wict Address: Sama 
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DECLARATION AND POWER OF ATTORNEY FOR PATENT APPLICATION 

As a below named inventor, I hereby declare that: 

My residence, past office address and crwenship w as «8ted bafow ami to my name. 

I believe I am the oriftfeal first and sole Inventor llf only one name Is listed below} or an original, first and }omt inventor {if plural names are listed bslow) of the sublet matter 
which is claimed and far which a pa«mt is sought on the invention entitled 

INDEXING SYSTEM AND METHOD FOR NEAREST NEIGHBOR SEARCHES IN HIGH DIMENSIONAL OATA SPACES 

the spedtication of which is attached hereto unlaw the following bo* is checked: 
was fted on 

as United States Application Nurrtiar or PCT Internationa! AppBctftonNurnber 

and was amended on w fit eppticahie). 

I hereby state that 1 have reviewed and understand the contents of the above identified specification, including The claims, as amended by any amendment referred to above. 

! acknowledge the duty to disclose information which is material to patentability as defined in 37 CFH $1.56. 

I hereby claim foreign priority benefits tmdar35 USC 4U9(a*dl or §366(bl o( any foreign Application^) for patent or inventor's certificate, or *365|8} of any PCT 
International application which designated &t teatt one country other than the United States, listed below and have also identified below, by checking the box. any foreign 
application tar patent or inventor's certificate, or PCT International application having a tiling dara before that of the application which priority is claimed. 

Prior Foreign Application^): 



INuntsrl tCountry) IDayfMonihfYttrfiied} 

i hereby claim the benefit under 35 USC 5119(e) of any United States provisional applications) feted below: 

Provisional Application^): 

tApptication Number) {Filing Dale) 

1 hereby cfalro the benefit under 35 USC 5120 of any United States appiication{a] f or §3S5(c} of any PCT intamational application designating the United States, R$te(f below 
and. insofar as the subject matter of each of the claims of this application is not disclosed in the prior United States or PCT International application in the manner provided 
by ihe first paragraph of 35 USC I acknowledge tha duty to disclose information which is materia! to patentability as defined in 37 CFH §lifl which bourns available 
between the filing date of the prior application and the national or PCT international filing date of this application, 



Priority Not Claimed 



(Application Number} tfilmg Date) (Status • patented pending, abandoned) 

Power of Attorney: 

1 hereby appoint the foHewkp, attorney^} and/or agwitls) to prosecute this application end to transect ell business In the Patent and Trademark Office connected therewith. 



Thomas R. Berthold (#28,883) 

Richard M. Ludwin (#33,010) 

Marc D. McSwain (#44,929) 

KhanhQ.Tran (#41,352) 

John L Rogto (#33,549) 

Alison D. Mortinger (#39,306) 
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DECLARATION AND POWER OF ATTORNEY FOR PATENT APPLICATION 



Address all telephone calls to: Address all correspondence to: 

John L Rojiu^ John L Rogiu 

Rogitz & Associates 

(619) 338-8075 750 6 Street, Suite 3120 

San Diego, California 92101 

I hereby dectere that off statements made herein of my own knowledge are true end that all statements made on information antf befief are believed to be true; end funher that 
these statements were mette wiih the Mwwtedgethfrt witltu! feUe statements end the like so made ere punishable by fine or imprisonment, or both, under Section 1001 af T«J? 
18 of the United States Code and ifcet such wilful take statement* may jeopsrdiw the validity of the application of any patent issued thereon. 



Full name of sole or Km GUANG-HO CHA 



Inventor's signature; Dale: 



Ratidence: 344-1 Jangjun-Dang, Gurojung-Gu, Pusan, 609-393, Republic of Korea 




citizanship: Republic of Korea (South Korea) pest m» mmv. same 

Full nsme \ 

la 

Inventor's sgnstute: " Oaie-. 

Residence: Korean Advanced Institute orTechnoIogy, Dept. of Computer Science, 373-1 Kusong-Dong, Yusong-Gu 
Taejon 305-701^ IzpabiiC of -fche*. 

t m |<|| i, „- , ^ „„,..?; , 1L __ t mi iji r-- ......n.„ J .....rr..^-f 1J .,. „ «„..,...- 

Citizenship: United States Post Office Address; Same 
fell »»ma of third inventor: ORAGUTIN PETKOVIC 

Inventor's signature: Oete: 

Residence: 13591 Old Tree Way, Saratoga, California 95070 

Culienshi'p: United StatfiS Post Office Address; Same 
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DECLARATION AND POWER OF ATTORNEY FOR PATENT APPLICATION 

M «~ imtr X1A0MING ZHU 

\nvoM<x r z signature Oslo: 

R^idsnce: 1407 Heckman Way, San Jose, California 95129 



citizenship: Peoples Republic of China Post Offict Addrfrss: Same 



